AITopics | parallel tempering

Collaborating Authors

parallel tempering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conditional Diffusion Sampling

Castro-Macías, Francisco M., Morales-Álvarez, Pablo, Syed, Saifuddin, Hernández-Lobato, Daniel, Molina, Rafael, Hernández-Lobato, José Miguel

arXiv.org Machine LearningMay-6-2026

Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference and the target distribution. Parallel Tempering (PT) serves as the gold standard, while recent diffusion-based approaches offer a continuous alternative at the cost of neural training. In this work, we introduce Conditional Diffusion Sampling (CDS), a framework that combines these two paradigms. To this end, we derive Conditional Interpolants, a class of stochastic processes whose transport dynamics are governed by an exact, closed-form stochastic differential equation (SDE), requiring no neural approximation. Although these dynamics require sampling from a non-trivial initialization distribution, we show both theoretically and empirically that the cost of this initialization diminishes for sufficiently short diffusion times. CDS leverages this by a two-stage procedure: (1) PT is used to efficiently sample the initial distribution, and then (2) samples are transported via the transport SDE. This combination couples the robust global exploration of PT with efficient local transport. Experiments suggest that CDS has the potential to achieve a superior trade-off between sample quality and density evaluation cost compared to state-of-the-art samplers.

artificial intelligence, machine learning, target distribution, (17 more...)

arXiv.org Machine Learning

2605.04013

Country: Europe (0.67)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Parallel Tempering With a Variational Reference

Neural Information Processing SystemsDec-23-2025, 16:45:32 GMT

Sampling from complex target distributions is a challenging task fundamental to Bayesian inference. Parallel tempering (PT) addresses this problem by constructing a Markov chain on the expanded state space of a sequence of distributions interpolating between the posterior distribution and a fixed reference distribution, which is typically chosen to be the prior. However, in the typical case where the prior and posterior are nearly mutually singular, PT methods are computationally prohibitive. In this work we address this challenge by constructing a generalized annealing path connecting the posterior to an adaptively tuned variational reference. The reference distribution is tuned to minimize the forward (inclusive) KL divergence to the posterior distribution using a simple, gradient-free moment-matching procedure. We show that our adaptive procedure converges to the forward KL minimizer, and that the forward KL divergence serves as a good proxy to a previously developed measure of PT performance. We also show that in the large-data limit in typical Bayesian models, the proposed method improves in performance, while traditional PT deteriorates arbitrarily. Finally, we introduce PT with two references---one fixed, one variational---with a novel split annealing path that ensures stable variational reference adaptation. The paper concludes with experiments that demonstrate the large empirical gains achieved by our method in a wide range of realistic Bayesian inference scenarios.

name change, parallel tempering, variational reference, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.59)

Add feedback

IsingFormer: Augmenting Parallel Tempering With Learned Proposals

Bunaiyan, Saleh, Delacour, Corentin, Chowdhury, Shuvro, Lee, Kyle, Camsari, Kerem Y.

arXiv.org Artificial IntelligenceSep-30-2025

Markov Chain Monte Carlo (MCMC) underlies both statistical physics and combinatorial optimization, but mixes slowly near critical points and in rough landscapes. Parallel Tempering (PT) improves mixing by swapping replicas across temperatures, yet each replica still relies on slow local updates to change its configuration. We introduce IsingFormer, a Transformer trained on equilibrium samples that can generate entire spin configurations resembling those from the target distribution. These uncorrelated samples are used as proposals for global moves within a Metropolis step in PT, complementing the usual single-spin flips. On 2D Ising models (sampling), IsingFormer reproduces magnetization and free-energy curves and generalizes to unseen temperatures, including the critical region. Injecting even a single proposal sharply reduces equilibration time, replacing thousands of local updates. On 3D spin glasses (optimization), PT enhanced with IsingFormer finds substantially lower-energy states, demonstrating how global moves accelerate search in rugged landscapes. Finally, applied to integer factorization encoded as Ising problems, IsingFormer trained on a limited set of semiprimes transfers successfully to unseen semiprimes, boosting success rates beyond the training distribution. Since factorization is a canonical hard benchmark, this ability to generalize across instances highlights the potential of learning proposals that move beyond single problems to entire families of instances. The IsingFormer demonstrates that Monte Carlo methods can be systematically accelerated by neural proposals that capture global structure, yielding faster sampling and stronger performance in combinatorial optimization.

artificial intelligence, machine learning, proposal, (18 more...)

arXiv.org Artificial Intelligence

2509.23043

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Generalised Parallel Tempering: Flexible Replica Exchange via Flows and Diffusions

Zhang, Leo, Potaptchik, Peter, Doucet, Arnaud, Dau, Hai-Dang, Syed, Saifuddin

arXiv.org Machine LearningFeb-14-2025

Parallel Tempering (PT) is a classical MCMC algorithm designed for leveraging parallel computation to sample efficiently from high-dimensional, multimodal or otherwise complex distributions via annealing. One limitation of the standard formulation of PT is the growth of computational resources required to generate high-quality samples, as measured by effective sample size or round trip rate, for increasingly challenging distributions. To address this issue, we propose the framework: Generalised Parallel Tempering (GePT) which allows for the incorporation of recent advances in modern generative modelling, such as normalising flows and diffusion models, within Parallel Tempering, while maintaining the same theoretical guarantees as MCMC-based methods. For instance, we show that this allows us to utilise diffusion models in a parallelised manner, bypassing the usual computational cost of a large number of steps to generate quality samples. Further, we empirically demonstrate that GePT can improve sample quality and reduce the growth of computational resources required to handle complex distributions over the classical algorithm. Sampling from a complex probability distribution π over a state-space X, whose density π(x) is only known up to a normalising constant, is a fundamental task in modern statistical inference.

artificial intelligence, kernel, machine learning, (15 more...)

arXiv.org Machine Learning

2502.10328

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Singapore (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Parallel Tempering With a Variational Reference

Neural Information Processing SystemsOct-9-2024, 09:25:44 GMT

parallel tempering, posterior distribution, variational reference, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.42)

Add feedback

Non-reversible Parallel Tempering for Deep Posterior Approximation

Deng, Wei, Zhang, Qian, Feng, Qi, Liang, Faming, Lin, Guang

arXiv.org Artificial IntelligenceNov-19-2022

Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O(P^2)$ to $O(P)$ given sufficiently many $P$ chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost $O(P\log P)$ based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs.

approximation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.10837

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Deep Tempering

Desjardins, Guillaume, Luo, Heng, Courville, Aaron, Bengio, Yoshua

arXiv.org Machine LearningOct-1-2014

Restricted Boltzmann Machines (RBMs) are one of the fundamental building blocks of deep learning. Approximate maximum likelihood training of RBMs typically necessitates sampling from these models. In many training scenarios, computationally efficient Gibbs sampling procedures are crippled by poor mixing. In this work we propose a novel method of sampling from Boltzmann machines that demonstrates a computationally efficient way to promote mixing. Our approach leverages an under-appreciated property of deep generative models such as the Deep Belief Network (DBN), where Gibbs sampling from deeper levels of the latent variable hierarchy results in dramatically increased ergodicity. Our approach is thus to train an auxiliary latent hierarchical model, based on the DBN. When used in conjunction with parallel-tempering, the method is asymptotically guaranteed to simulate samples from the target RBM. Experimental results confirm the effectiveness of this sampling strategy in the context of RBM training.

artificial intelligence, machine learning, rbm, (17 more...)

arXiv.org Machine Learning

1410.0123

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback